Goto

Collaborating Authors

 observation action


OCMDP: Observation-Constrained Markov Decision Process

arXiv.org Artificial Intelligence

In many practical applications, decision-making processes must balance the costs of acquiring information with the benefits it provides. Traditional control systems often assume full observability, an unrealistic assumption when observations are expensive. We tackle the challenge of simultaneously learning observation and control strategies in such cost-sensitive environments by introducing the Observation-Constrained Markov Decision Process (OCMDP), where the policy influences the observability of the true state. To manage the complexity arising from the combined observation and control actions, we develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy. This decomposition enables efficient learning in the expanded action space by focusing on when and what to observe, as well as determining optimal control actions, without requiring knowledge of the environment's dynamics. We validate our approach on a simulated diagnostic task and a realistic healthcare environment using HeartPole. Given both scenarios, the experimental results demonstrate that our model achieves a substantial reduction in observation costs on average, significantly outperforming baseline methods by a notable margin in efficiency.


Value of structural health monitoring quantification in partially observable stochastic environments

arXiv.org Artificial Intelligence

Sequential decision-making under uncertainty for optimal life-cycle control of deteriorating engineering systems and infrastructure entails two fundamental classes of decisions. The first class pertains to the various structural interventions, which can directly modify the existing properties of the system, while the second class refers to prescribing appropriate inspection and monitoring schemes, which are essential for updating our existing knowledge about the system states. The latter have to rely on quantifiable measures of efficiency, determined on the basis of objective criteria that, among others, consider the Value of Information (VoI) of different observational strategies, and the Value of Structural Health Monitoring (VoSHM) over the entire system life-cycle. In this work, we present general solutions for quantifying the VoI and VoSHM in partially observable stochastic domains, and although our definitions and methodology are general, we are particularly emphasizing and describing the role of Partially Observable Markov Decision Processes (POMDPs) in solving this problem, due to their advantageous theoretical and practical attributes in estimating arbitrarily well globally optimal policies. POMDP formulations are articulated for different structural environments having shared intervention actions but diversified inspection and monitoring options, thus enabling VoI and VoSHM estimation through their differentiated stochastic optimal control policies. POMDP solutions are derived using point-based solvers, which can efficiently approximate the POMDP value functions through Bellman backups at selected reachable points of the belief space. The suggested methodology is applied on stationary and non-stationary deteriorating environments, with both infinite and finite planning horizons, featuring single- or multi-component engineering systems.


Active Goal Recognition

arXiv.org Artificial Intelligence

To coordinate with other systems, agents must be able to determine what the systems are currently doing and predict what they will be doing in the future---plan and goal recognition. There are many methods for plan and goal recognition, but they assume a passive observer that continually monitors the target system. Real-world domains, where information gathering has a cost (e.g., moving a camera or a robot, or time taken away from another task), will often require a more active observer. We propose to combine goal recognition with other observer tasks in order to obtain \emph{active goal recognition} (AGR). We discuss this problem and provide a model and preliminary experimental results for one form of this composite problem. As expected, the results show that optimal behavior in AGR problems balance information gathering with other actions (e.g., task completion) such as to achieve all tasks jointly and efficiently. We hope that our formulation opens the door for extensive further research on this interesting and realistic problem.


Planning with State Uncertainty via Contingency Planning and Execution Monitoring

AAAI Conferences

An example is a Mars rover: The major problem with applying POMDP approaches to thanks to low-level control and obstacle avoidance, rovers realistic planning problems like the Mars rovers is the sheer can be expected to reach their destinations reliably, and can size of the problems. Using point-based approximations and collect and communicate data, but they do not know in advance structured representations similar to those used in classical which science targets are interesting and hence will planning (Poupart 2005), problems with tens of millions provide valuable data. Similarly, robots performing tasks of states can be solved approximately, but even that corresponds such as security or cognitive assistance are generally able to to a classical planning problem with only 25 binary navigate reliably, but use unreliable vision algorithms to detect variables, which is a quite small problem by the standards the people and objects with which they are supposed of classical deterministic planning. The alternative we propose to interact. Following Besse and Chaib-draa (2009), we in this paper is to construct a series of classical deterministic will refer to problems with deterministic actions but stochastic planning problems from the quasi-deterministic observations as quasi-deterministic problems, which differ problem. By solving each of these deterministic problems from Deterministic-POMDPs (DET-POMDPS) (Bonet we construct a contingent plan--one that contains branches 2009) by taking into account of uncertainty from observation to be chosen between at run-time.


Symbolic Dynamic Programming for First-order POMDPs

AAAI Conferences

Partially-observable Markov decision processes (POMDPs) provide a powerful model for sequential decision-making problems with partially-observed state and are known to have (approximately) optimal dynamic programming solutions. Much work in recent years has focused on improving the efficiency of these dynamic programming algorithms by exploiting symmetries and factored or relational representations. In this work, we show that it is also possible to exploit the full expressive power of first-order quantification to achieve state, action, and observation abstraction in a dynamic programming solution to relationally specified POMDPs. Among the advantages of this approach are the ability to maintain compact value function representations, abstract over the space of potentially optimal actions, and automatically derive compact conditional policy trees that minimally partition relational observation spaces according to distinctions that have an impact on policy values. This is the first lifted relational POMDP solution that can optimally accommodate actions with a potentially infinite relational space of observation outcomes.